Uncontrolled Interthread Interference in Main Memory Can Destroy Individ- Ual Threads’ Memory-level Parallelism, Effectively Serializing the Memory Requests of a Thread Whose Latencies Would Otherwise Have Largely

نویسندگان

Onur Mutlu

Thomas Moscibroda

چکیده

......The main memory (dynamic RAM) system is a major limiter of computer system performance. In modern processors, which are overwhelmingly multicore (or multithreaded), the concurrently executing threads share the DRAM system, and different threads running on different cores can delay each other through resource contention. One thread’s memory requests can cause DRAM bank conflicts, row-buffer conflicts, and data/address bus conflicts with another’s. As the number of on-chip cores increases, the pressure on the DRAM system increases, as does the interference among threads sharing the system. Unfortunately, many conventional DRAM controllers are unaware of this interthread interference. They schedule requests simply to maximize DRAM data throughput. For example, the commonly used row-hit-first (FR-FCFS, or first ready, first come, first served) scheduling policy is thread unaware. Uncontrolled interthread interference in DRAM scheduling results in two major problems. First, as previous work showed, a state-of-the-art DRAM controller can unfairly prioritize some threads while starving more important threads for long time periods, as they wait to access memory (see the ‘‘Related Work on Memory Controllers’’ sidebar). For example, FR-FCFS unfairly prioritizes threads with high row-buffer hit rates over those witho low row-buffer hit rates. Similarly, an oldest-first scheduling policy implicitly prioritizes memoryintensive threads over memory-nonintensive ones. In fact, it is possible to write programs to deny DRAM service to more important programs running on the same chip, as we showed in our previous work. Such mmi2009010022.3d 31/1/09 16:11 Page 22

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling Onur Mutlu Thomas Moscibroda Microsoft Research Abstract In a chip-multiprocessor (CMP) system, the DRAM system is shared among cores. In a shared DRAM system, requests from a thread can not only delay requests from other threads by causing bank/bus/row-buffer conflicts but they can also destro...

متن کامل

Parallelism-Aware Batch Scheduling: Paving the Way to High-Performance and Fair Memory Controllers

In modern processors, the DRAM system is shared among concurrently-executing threads. Memory requests from a thread can delay requests from other threads by causing bank/bus/rowbuffer conflicts. Conventional DRAM controllers are unaware of inter-thread interference, which causes two problems. First, some threads are unfairly penalized and denied DRAM service for long time periods. Second, as we...

متن کامل

Memory Compression Coordinated and Optimized Prefetching in GPU Architectures

Traditionally, GPU architectures have been primarily focused on throughput and latency hiding. However, as the computational power of GPUs continues to scale with Moore’s law, an increasing number of applications are becoming limited by memory bandwidth [1]. Also, data locality and reuse are becoming increasingly important with power-limited technology scaling. The energy spent on off-chip memo...

متن کامل

Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor

Simultaneous Multi-threading (SMT) has been developed to increase instruction level parallelism by allowing instructions from a different thread to run during a stall. Inter-thread cache interference, however, might limit the benefit of running multiple independent threads. SMT processors can be utilized in a different model, where a helper thread is used to prefetch cache blocks for the main e...

متن کامل

Multigranular Thread Support in WaveScalar

WaveScalar is a recently proposed scalable microarchitecture. The original WaveScalar research developed and evaluated an ISA and microarchitecture that efficiently executes a single, coarse-grain thread. In this paper, we expand that design to support multiple, simultaneously executing threads. Four mechanisms make this possible: (1) instructions that enable and disable wave-ordered memory; (2...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Uncontrolled Interthread Interference in Main Memory Can Destroy Individ- Ual Threads’ Memory-level Parallelism, Effectively Serializing the Memory Requests of a Thread Whose Latencies Would Otherwise Have Largely

نویسندگان

چکیده

منابع مشابه

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling

Parallelism-Aware Batch Scheduling: Paving the Way to High-Performance and Fair Memory Controllers

Memory Compression Coordinated and Optimized Prefetching in GPU Architectures

Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor

Multigranular Thread Support in WaveScalar

عنوان ژورنال:

اشتراک گذاری